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Abstract 

We document Fortran MPI checkerboard code for Markov Chain Monte Carlo simulations of pure SU(3) lattice 
gauge theory with the Wilson action on a D-dimensional double-layered torus. This includes the usual torus with 
periodic boundary conditions as an optional case. We use Cabibbo-Marinari heatbath checkerboard updating. 
Parallelization on sublattices is implemented in all D directions and can be restricted to less than D directions. 
The parallelization techniques of this paper can be used for any model with interactions of link variables defined 
on plaquettes. 

Program Summary 

Program title: STMC2LSU3MPI . 

Program identifier: Not yet available. 
Program summary URL: Not yet available. 

Program available from: Temporarily from URL http://www.hep.fsu.edu/~berg/research 

Programing language: Fortran 77 with MPI extensions. 

Computer: Any capable of compiling and executing Fortran 77 code with MPI extensions. 

Key words: Markov Chain Monte Carlo, Parallelization, MPI, Fortran, Checkerboard updating, Lattice gauge theory, 
SU(3) gauge group. 

PACS: 02.70.-c, 11.15.Ha 



1. Introduction 

Moore's law [1] appears to be dead. Certainly 
we have not seen CPU processor speed going up 
by a factor of ten in the last five years. Instead, 
we get now ten times as many processors (more 
precisely cores) for the price of one five years ago. 
PCs with 8 cores have become commodities and 
soon one may expect 64 or more. The usefulness 
of parallelization is no longer limited to large scale 



supercomputer applications, but becomes relevant 
for everyday calculations. 

This motivates the present paper, which docu- 
ments Fortran 77 MPI checkerboard [2] code for 
Markov Chain Monte Carlo (MCMC) simulations 
of pure SU(3) Lattice Gauge Theory (LGT) with 
the Wilson action on D-dimensional lattices. Sub- 
lattices are updated in parallel after collecting 
boundary variables from other sublattices. The 
introduced parallelization techniques apply to any 
model with dynamical variables defined on links 
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and their interactions on plaquettes. 

The code of this paper implements the Cabibbo- 
Marinari (CM) SU(3) updating [3] using for 
the SU(2) subgroups the heatbath method of 
Fabricius-Haan [4] and Kennedy-Pendleton [5] 
(FHKP), which is more efficient than the older 
Creutz heatbath [6]. CM with FHKP SU(2) up- 
dating is also about three times more efficient than 
Pietarinen's [7] full SU(3) heatbath [8]. 

To synchronize the simulations on all pro- 
cesses, we use FHKP updating in the multi-hit 
accept /reject version [9]. Overrelaxation moves 
[10] are presently not implemented, but would fit 
seamlessly into the code. In extension of the usual 
periodic boundary conditions (PBC), which define 
the gauge system on a torus, our code allows for a 
double-layered torus (DLT). These are two iden- 
tical lattices, each using the other as boundary, a 
geometry expected to be of relevance for studies 
of the deconfining phase transition. 

The next section gives an overview of the code 
and explains Web access. Section 3 provides a num- 
ber of verifications. Summary and conclusions fol- 
low in section 4. Runs are setup in the code, which 
reproduce the examples of this and a companion 
paper [11]. Running on up to 1296 CPU cores, 
the companion paper studies performance as func- 
tion of the number of MPI processes. It also dis- 
cusses and resolves problems, which were encoun- 
tered with MPI send and receive instructions for 
large arrays. 



2. Overview of the Code 

The code for this paper is freely available as a 
gzipped archive 

STMC2LSU3MPI.tgz 

that can be downloaded from the website of the 
author 

http : / / www . hep . f su . edu/~ berg/ research . 
With 

tar -zxvf STMC2LSU3MPI.tgz 

the folder structure of Fig. 1 is created. Main pro- 
grams are located in ForProg. The LIBS folder con- 
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Fig. 1. Structure of our program package. 

tains a number of libraries with plain Fortran 77 
and Fortran 77 MPI code. Test and verification 
runs are setup in subfolders of several Project 
folders. Non-MPI SU(3) code and runs are in the 
project tree STMCSU3. 

We use checkerboard labeling [2] to divide lattice 
sites into two sets of colors i c = 1, 2. Moving one 
step in any direction changes the color. For spin 
models with nearest neighbor interactions this al- 
lows one to update spins at half of the sites in par- 
allel. For SU(3) LGT the matrices are located on 
lattice links and one can update at half of the sites 
one of the link directions in parallel. This is em- 
ployed to update sublattices in parallel after col- 
lecting from other sublattices boundary variables, 
which need no updating because they belong to an- 
other checkerboard or link direction. For efficient 
performance the sublattice volume to surface ratio, 
each measured in numbers of variables, has to be 
sufficiently large. Examples are discussed in [11]. 

To arrange storage of our SU(3) matrices and 
other physical variables, we label lattice sites and 
links following the book of the author [12]. Cor- 
responding routines from ForLib of this reference 
are taken over into Libs/Fortran of Fig. 1. In our 
approach a lattice site is specified by a single inte- 
ger i s , which we call site number. The dimension 
of the lattice is given by D. The Cartesian coordi- 
nates of a site are chosen to be 

x l =0, ...,n l -l for i=l,...,D. (1) 

The site number is defined by the formula 



2 



D 



1 for i = 1, 

ripi^' for t > i, (2) 



and calculated by the Fortran function isfun.f. 
Vice versa, the coordinates for a given site number 
i s are obtained by an iteration procedure, which 
relies on Fortran integer division (i.e., 1 = [5/3] = 
[5/5], = [4/5], etc.). Let n s 
number of lattice sites. Then, 
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and for i = D — 1, 
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The Fortran subroutine ixcor . f computes coordi- 
nates from the site number, though somewhat dif- 
ferently than by the formulas written down here. 

The site number i s allows one to store variables 
at sites in ID arrays Ai(n s ), independently of the 
lattice dimension D. Variables on links are located 
in 2D arrays Ai (n s , nd) , where the integer nd is the 
lattice dimension D, nd > 2 for LGT. One more 
label is required to store SU(3) matrix elements in 
a 3D array. For checkerboard labeling we arrange 
the lattice variables in two arrays, corresponding 
to the colors i c = 1, 2. The formula returning the 
color assignment of a lattice site is 



i c = 1 + mod 



i=i 



(5) 



To update variables in array 1 , neighbor variables 
are collected from array 2, which remains un- 
changed, and vice versa. LGT requires also to col- 
lect variables from the same checkerboard, which 
are not updated, because they are on links in other 
directions than the one updated. The checker- 
board algorithm requires even lattice extensions. 
Otherwise PBC destroy the pattern. 

2.1. Updating 

Our code implements CM [3] SU(3) updating us- 
ing for the SU(2) subgroups the FHKP [4,5] heat- 



bath algorithm. In the original FHKP version pro- 
posals are repeated until one is accepted, which is 
by construction from the desired probability dis- 
tribution. For parallelization this is inconvenient, 
because all MPI processes have to wait until the 
last one finished. As pointed out by Fredcnhagen 
and Marcu [9], one can terminate the inner loop 
after a finite number of hits and keep the link ma- 
trix at hand when none of the proposal has been 
accepted. The new configuration is still proposed 
with the local hcatbath distribution. What changes 
is the average stay time of the old configuration. 
This time depends on the configuration at hand, 
but drops out in the detailed balance equation. By 
using CM heatbath in this Metropolis- like fashion 
the MPI processes get synchronized. The 1-hit ac- 
ceptance rate depends on [3 and is in the scaling 
region of SU(3) LGT around 97%. Lower 1-hit ac- 
ceptance rates are encountered for smaller (i val- 
ues. One may then increase the number of hits. 

As usual, the updating step keeps track of the to- 
tal action. Due to parallelization action differences 
have to be added by the MPI process of the sub- 
lattice on which the update is carried out. Then, 
action fluctuation across boundaries can be cre- 
ated, which lead in course of time to absurd sub- 
lattice contributions, while the total action (their 
sum) is still correct. To elaborate on this point, 
we first need to define sublattice actions. Due to 
links crossing boundaries, there is some amount of 
freedom in that. We simply attribute the action of 
a plaquette to the sublattice, which contains the 
site from which two forward links of the plaquette 
emerge. Now, updating one of the other links of a 
plaquette, the action change is recorded in a wrong 
sublattice, if the updated link emerges there. 

As long as one is only interested in the total 
action, it is sufficient to recalculate the sublattice 
action once in a while directly to prevent an am- 
plification of rounding errors due to differences of 
large numbers (sublattice contributions can fluc- 
tuate to negative values). However, if one wants 
to attribute physical significance to sublattice ac- 
tions, it is mandatory to recalculate them before a 
measurement is recorded. 

Our updating subroutine is cbsu3_2hbnhit . f 
located in Libs/MPISU3. A call to this routine per- 
forms one sweep, which is here defined by updat- 
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Fig. 2. 2D Double layered torus. 

ing each SU(3) matrix once in sequential order (see 
section 2.4 for more details). Updating in sequen- 
tial order fulfills balance and is more efficient than 
updating link matrices in random order. This ob- 
servation holds also for spin models [12]. 

2.2. Double-Layered Torus 

This section can be skipped by readers, who are 
only interested in simulations with PBC. The DLT 
is, for instance, of interest for simulations on Nf N T 
lattices if one likes to have boundaries at a different 
temperature than the interior of the lattice [13], 
as it is the case for deconfined volumes created in 
relativistic heavy ion collisions. The DLT is defined 
by two lattices of identical size, each using the other 
as boundary in all or just volume directions. In 
the latter case distinct values in the lattices lead 
to different physical temperatures T through the 
usual definition of T = l/(aN T ). 

Even with identical values in both lattices the 
DLT has some intriguing properties as illustrated 
in Fig. 2 for a 2D DLT of size (N s ) 2 . The boundaries 
are glued together as indicated by the arrowJH. 
While for PBC the shortest connection of a point 
with itself through the boundary is of length N s , it 
is now of length y/2 N s along the diagonal. The two 
arrows in diagonal direction give an example of a 
line, which is closed by DLT boundary conditions. 

1 Note that interchanging the labels 3 and 4 on one of the 
lattices of the figure leads to an undesirable situation in 
which some sites pairs are connected by two links 



Compared to a torus of size (N S ) D , the effective 
extension of a DLT with DLT boundary conditions 
in all directions is 

Nf = 2 1/D N s , (6) 

so that (Nf) is the size of the DLT. One may ar- 
gue that finite length corrections are exponentially 
suppressed by \/2N s , which is for D > 2 larger 
than Nf. Then one would for D > 3 gain with 
respect to the suppression of finite size effect com- 
pared to the usual torus. However, simulations of 
the 3D and 4D Ising model on a DLT [14] showed 
an exponential suppression of finite size corrections 
with Nf and not with y/2N s - The reason for that 
has remained unclear. 

When using two different values, 0q ^ 0i, we 
assign a unique 0i, i = 0, 1 to each plaquette in a 
slightly asymmetrical way: If any link of a plaque- 
tte is from the second torus, we take 0\, otherwise 
00- Technically this is done by tagging all links in 
the first torus by and in the second by 1. When 
considering a plaquette, all these tags are added 
up. If the sum is zero, 0o is used, otherwise 0\. So 
the 0\ lattice becomes slightly larger than the 0o 
lattice. 

2.3. Parameter files 

As indicated in Fig. 1 runs are kept in subfolders 
of project folders, one run per subfolder. The rel- 
evant parameters are set in two files: latmpi . par 
and mc.par. Before the compile step the parame- 
ters are transferred by a simple preprocessing pro- 
cedure into subroutines and, in particular, used 
to dimension common blocks properly (see sec- 
tion 2.4). Due to this procedure it is mandatory 
that runs and their parameter files are kept two 
levels down from the STMC2LSU3MPI root directory. 

As an example the parameter files of the run in 

lMPICH/08x08y08z04t5p65b2f3d 
are given below. 

latmpi .par 

c Bernd Berg Jan 11, 2009. MPI Checkerboard lattice: 
c cfln Part of data file name, 

c lbcex Boundary exchange T/F. 

c lsd2mpi .true, distinct random number seeds on each process, 
c .false, identical random number seeds (for tests), 

c nd Dimension of the lattice space 
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c nil Lattice extension in first directions. Also used as 

c label for the entire lattice. (nl2, nl3, nl4 lattices 

c extension in directions 2, 3 and 4). 

c nltau Disfunctional . Planned option for nlat=2 to subdivide 
c the nl4 direction of one torus (gives higher T) . 

c Presently not implemented: nltau=nl4 required, 

c mxs Number of space sites, mxsc same per checkerboard, 

c ms Number of sites, msc per checkerboard, 

c mlink Number of links, mlinkc links per checkerboard, 
c mp Number of plaquettes (mpc plaquettes per checkerboard) . 

character cf ln*6 , cf ile*18 

parameter (cf ln="SU3LGT" , lbcex= . true . , lsd2mpi=. true . ) 

parameter(nd=4,ndml=nd-l,n2d=2*nd,nll=4,nl2=4,nl3=4,nl4=4) 

parameter (nltau=nl4) ! Always (purpose not implemented) . 

parameter (mxs=nll*nl2*nl3 ,mxsc=mxs/2,ms=mxs*nl4 ,msc=ms/2) 

parameter (mlink=nd*ms ,mlinkc=nd*msc) 

parameter (mp=ms* (nd* (nd-1) ) /2 ,mpc=mp/2) 
c nptime Number of timelike plaquettes. 
c npspace Number of spacelike plaquettes . 

parameter (mptime=ms* (nd-1) ,mpspace=mp-mptime) 
c nlat 1 or 2 layers (not other values allowed) . 

c lat2 must be false for nlat=l, active (false or true) only for 

c nlat=2: .true, boundary exchange between DLT layers; 

c .false, no boundary exchange between DLT layers, 

c lat2test .true, sets identical random numbers in both layers, 
c ndmpi Dimension of the MPI lattice, 
c mpifactor Number of sublattices per ndmpi direction, 
c msmpi Total number of MPI sublattices (MPI processes) per layer, 
c mpmpi Total number of plaquettes over all sublattices (one layer) . 
c mscb Size of one checkerbord boundary, 

c mbcs Extra storage size for boundaries, which 

c enters definition of nmat below. 

c mbcsh Extension of pointer array sizes to gather boundaries. 

c noffset For definition of receive position in 1. gather/scatter, 
parameter (nlat=l , lat2= .false . , lat2test= . false . ) 
parameter (ndmpi=3 , mpifactor =2 ,msmpi=mpif actor* *ndmpi) 
parameter (n2dmpi=2*ndmpi ,mpmpi=msmpi*mp) 
parameter (mscb=nl2*nl3*nl4/2,mbcs=n2dmpi*mscb) 
parameter (mbcsh=mbcs/2 ,nof f set=msc+ndmpi*mscb) 

c nddmpi Number of combinations of two ndmpi directions. 

c nslfbb Number of matrices for gather in 2. gather/scatter, 

c which is for corner plaquettes. 

c nsfbb For check on number of corner plaquetted in cblgtpnt2.f . 
c It enters definition of nmat below. 

c nmat Number of SU3 matrices including those gathered from 

c neighbouring sublattices . 

c nl8 Number of SU3 matrix elements. 

parameter (nddmpi=ndmpi* (ndmpi-1) , nslf bb=nl3*nl4/2) 
parameter (nsf bb=nddmpi*nslf bb ,nmat=msc+mbcs+nsfbb,nl8=18) 

C Array sizes in latmpi.dat have to be consistent! ! ! 

Central are the sublattice extensions, nil, nl2, 
nl3, nl4 in 4D, and the MPI parameters ndmpi, 
mpifactor. For mpifactor = 1 there is only one 
process and the entire lattice agrees with the sub- 
lattice. For mpifactor > 1 the extensions of the 
entire lattice agree with those of the sublattice 
in directions larger than ndmpi, which exist for 
ndmpi < nd, and there are mpifactor sublattices 
in each of the ndmpi directions. The sublattices 
themselves form a lattice of dimension ndmpi with 



msmpi = mpifactor * *ndmpi 



(7) 



points, which we refer to as MPI lattice. To allow 
for variable extensions, the sublattice values are 
stored in an array nla. To make use of the same 
routines, the MPI lattice extension mpifactor is 
similarly stored in an array nlajnpi. Both arrays 
are initialized in the file latmpi .dat: 



data nla/nll ,1112,1113,1114/ ,nla_mpi/ndmpi*mpif actor/ 

Usual PBC are simulated for nlat = 1, the 
DLT for nlat = 2. For PBC the number of 
MPI processes in msmpi, while for the DLT it 
is 2*msmpi. We will get familiar with choices of 
other latmpi. par parameters when we perform 
verification and test runs in the next section. 
The example file for mc .par is: 

c Bernd Berg, Jan 11, 2009. 

c Definition of parameters for SU(3) LGT simulations, 
c Output units iuo, iudl, iud2 (off/on with lud2) , iud3. 
c Job number njob and seeds iseedl, iseed2. 

parameter (iuo=6 , iudl=ll , iud2=12, lud2= . false . , iud3=13) 

parameter (njob=l , iseedl=njob , iseed2=0) 
c Parameters for equilibrium and data production sweeps: 
c betaO.l: beta_g=2/g~2 inverse bare coupling, 
c istart: 1 ordered start all matrices 1; 2 disordered start, 
c nhit : Number FHKP proposals made. 

c nreq: Number of repititions of nequi equilibrium sweeps. 

c nequi: Number of equilibrium sweeps. 

c nrpt : Number of repititions of nmeas measurement. 

c nsw: Number of sweeps between measurements. 

c nmeas: Number of measurement sweeps per repitition. 

parameter (beta0=5 . 65d00 ,betal=betaO, i start =1 , nhit =1) 
parameter (nreq=l ,nequi=2**12 ,nrpt=32 ,nsw=2 ,nmeas=nequi) 

The purpose of most parameters should be ob- 
vious from the comments. The MCMC run struc- 
ture is that defined by nequi, nrpt and nmeas in 
Ref. [12]. After equilibration measurements are 
saved in nrpt blocks to allow for a conveniently 
binned analysis, employing jackknife methods 
when suitable. There are nsw sweeps done between 
measurements. 

2.4. Program structure 

We trace the code structure and that of a typical 
run from the main program 



cbsu3_dlt{a, b, c}.f . 



(8) 



The program comes in three versions {a, b, c}, 
where b is obtained from a by simply replacing ev- 
erywhere in the code mpia by mpib, and similarly 
for c. As discussed in [11], differences lie in the 
coding of MPI send and receive instructions. We 
were unable to find a single solution which works 
on all MPI platforms on which we performed tests. 
The a version, which uses the simplest (plain) 
subroutines for boundary transfers, is listed in the 
following. 

program cbsu3_dlta ! Berg Jan 11 2009. 
C MPI checkerboard for SU3 nhit with Cabibbo-Marinari . Perodic 
C boundary conditions and double-layered torus with two beta values. 
C Version a: Plain send/receive (mpia extensions). 
C Version b: Buffered send/receive (replace mpia by mpib). 



•5 



C Measurements of action, spacelike, timelike plaquettes. 
C Recalculation of action before each measurement is only 
C needed for nlat=2, lat2=.true.. 

include '../.. /Libs/Fortran/ implicit . 08' 

include 'mpif .h ! 

include ' . ./. ./Libs/Fortran/constants. 08' 
character cmy*4 
include ' latmpi . par ' 
include ' mc . par ' 

include ' . ./ . . /Libs/MPI_par/common_cblat . f ' 

include ' . ./. . /Libs/MPISU3/common__cbsu3 . f ' 

ltest=.true. 

ltest=. false. 

call mpi^init (ierr) 

call mpi_ comm_rank(mpi_comm_ world, my _id, i err) 
if (my_id. eq. 0) call 
& writ e_pr ogress (iudl , "mpia: nreq.nequi ,one : " ,nreq,nequi , one) 

c 

if (my.id.eq.O) write(iuo , ' (/ , "MPI cbsu3_dlt: nlat,", 
& "nhit,betaO,betal =" , 12, I3.2F12 . 8) ' ) nlat ,nhit ,beta0 ,betal 
if (nhit . le . . or . nhit . gt . 3) ltest= . true . 

c 

write (cmy , ' (i4 .4) ' ) my_id 

if (lud2) open(iud2,f ile="MPI" //cmy//" ,d",form= 'formatted' , 
& status= ' unknown ' ) 

call cbsu3_2init_mpi ! Initialization. 

if (lud2) write (iud2, ' (" cbsu3_dlt : cbsu3_2init_mpi done . " ) ' ) 

if (ltest) then ! 

if (my_id. eq.O) write (iuo, ' (/ , " cbsu3_dlt_mpi : ltest" ,/) ' ) 
if (lud2) write Ciud2, ' (/ , "cbsu3_dlt_mpi : ltest .",/)') 
if(lud2) close(iud2) ! iudl should already be closed, 
call mpi_f inalize (ierr) 

stop "cbsu3_dlt : ltest . " 

endif ! 

call cbsu3_actdif _mpi (actdif , actsuml , actsum2 ,my_id) 
if(lud2) write(iud2, ' (" cbsu3_dlt : actdif ok after start.")') 
if (my_id. eq. 0) write (iuo ,'(/," call write_act_mpi : ") ' ) 
call write_act_mpi (my_id, izero ,acpt , actdif , actsuml , actsum2) 
call mpi_barrier(mpi_comm_world, ierr) 

c 

if(lud2) write (iud2 ,'(/, " Equi : nreq, nequi =",2110)') nreq,nequi 
if (my_id. eq. 0) write (iuo ,'(/," Equilibration started. . . ") ' ) 
do ireq=l,nreq 

if (my_id. eq.O) call 
& wr it e_pr ogress (iudl , "ireq, nreq, act : " , ireq, nreq, act) 
do iequi=l , nequi ! Sweeps for reaching equilibrium. 

call cbsu3_2hbnhit_mpi(my_id) ! Heatbath DLT. 
end do ! Check action: 

call cbsu3_actdif_mpi (actdif , actsuml , actsum2 ,my_id) 
acpt=acpt/a0prop 
end do 

call write_act_mpi (my_ id, izero ,acpt , actdif , actsuml , actsum2) 
if(lud2) write(iud2, ' (/, " Equilibration done acpt =",F10.4)') acpt 

if (my_id. eq. 0) write (iuo ,'(/," Equilibration, actdif done, " , 
& 11 acpt, aOprop =" ,F10 ,4,G15 .6) ' ) acpt.aOprop 

if(lud2) write(iud2, ' (" actdif , actl ,act2 :" ,3G15 . 6) ' ) 
& actdif, (actsuml/mpmpi) , (actsum2/mpmpi) 

call mpi_barrier(mpi_comm_world, ierr) 

c 

c Writing header information into file time series (action) 
open (iudl ,f ile=cf ln//cf ile//" .D" ,f orm=" unformatted" , 
& st at us= " unknown " ) 

write (iudl) betaO.betal ,nd, nlat ,ms ,mlink ,nla,ndmpi ,msmpi , nla_mpi , 
& nreq.nequi ,nrpt ,nmeas ,nsw 

close(iudl) 

if (my_id. eq.O) write (iuo , ' (/ , IX, "irpt , act , actdif , acpt ,tsa(l) , " , 
& "tspr(l) ,tspi(l) ,tsklr(l) ,tskli(l) :",/)') 

if(lud2) write(iud2, 
k ' (/ , 5X, "irpt , action/mp, rounding error, acpt rate:",/)') 
do irpt=l,nrpt ! Repetitions. 

iact=nint (act) 

if (my_id. eq.O) call 
& wr it e_pr ogress (iudl , "irpt , iact , acpt : " , irpt , iact , acpt) 

acpt=zero 

a_min=act 

a_max=act 

do imeas=l ,nmeas ! Measurements loop, 
do isw=l,nsw 

call cbsu3_2hbnhit_mpi (my_id) ! SU3 Cabbibo-Marinari . 
end do 

call cbsu3_actdif_mpi (actdif , actsuml , actsum2 ,my_id) 



tsa(imeas)=act/mp ! Data collection (measurement) . 
call cbsu3_wloops_mpi (imeas ,my_id) 
end do 

acpt=acpt/a0prop 

call cbsu3rw_meas (irpt , iudl , iud3 , ione) ! Write measurements. 
if(lud2) write(iud2, ' (I9,2G16.7,F9.3) ') irpt ,act/mp, actdif , acpt 

if (my_id .eq.O) then 

write (iuo , ' (I6.4G15 . 6) ' ) irpt , act .actdif ,acpt ,tsa(l) 
write (iuo, ' (6X.4G15.6) ') tspr(l) ,tspi(l) ,tsklr(l) ,tskli(l) 

endif 

call mpi_barrier (mpi_comm_world, ierr) 
end do 

call write_act_mpi (my_ id, izero, acpt , actdif , actsuml , actsum2) 

c 

close (iudl) 

if(lud2) close(iud2) 

call mpi^f inalize (ierr) 

stop 

end 

BLOCK DATA 

include ' . ./. . /Libs/Fortran/implicit . 08' 
include ' . ./. . /Libs/Fortran/constants . 08 ' 
include ' latmpi . par ' 
parameter (npointer=nd* (msc+mbcsh) ) 
include ' . ./. . /Libs/MPI_par/common_cblat .f ' 
include ' latmpi . dat ' 
C For test purposes only: 

data ipf l/npointer*mione/ , ipf 2/npointer*mione/ 
data ipbl/npointer*mione/ , ipb2/npointer*mione/ 
END 



C Modular Fortran routines: 



include ' 


../. 


/Libs/Fortran/isf un . f ' 




include ' 


. ./. 


/Libs/Fortran/ ipointer . f ' 




include ' 


. ./. 


/Libs/Fortran/ ixcor . f ' 




include ' 


. ./. 


/Libs/Fortran/lat_init . f ' 




include ' 


. ./. 


/Libs/Fortran/nsf un . f ' 




include ' 


. ./. 


/Libs/Fortran/razero . f ' 




include ' 


. ./. 


/Libs /Fort ran/ranmar . f ' 




include ' 


. ./. 


/Libs/Fort ran/rmaf un. f ' 




include ' 


. ./. 


/Libs/Fortran/rmaset . f ' 




include ' 


. ./. 


/Libs/Fortran/rmasave .f ' 




include ' 


. ./. 


/Libs/Fortran/ sum^f un.f ' 




include ' 


. ./. 


/Libs/Fort ran/wr it e_pr ogress .f ' 




Checkerboard: 






include ' 


../. 


/Libs/Fortran/istoic . f ' ! Checkerbord ic from 


is 


routine : 








include ' 


../. 


/Libs/Fortran/ su2_a0nhit . f ' ! Generates SU2 


aO 



C SU3 modular routines: 

. ./Libs/SU3/su3init0.f ' 
. ./Libs/SU3/su3initl.f ' 
. ./Libs/SU3/su3add_m_m.f ' 
. ./Libs/SU3/su3addb_m_m.f 
. ./Libs/SU3/su3copy_m_m.f 



include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 


include ' 


../ 



SU3 matrix put to 0. 
SU3 matrix put to 1 . 



include 



./. 



. /Libs/SU3/su3mult_m_mh_m. f ' 
. /Libs/SU3/ su3mult_m_m_m . f ' 
. /Libs/SU3/su3reunit . f ' 
. /Libs/SU3/su3_b2nhitupdt . f ' 
. /Libs/SU3/csu3_start . f ' ! 



! nhit update . 
Start configuration. 



C Modular MPI routines: 

include ' . ./. . /Libs/MPI/lat2a_init . f ' 
include '../.. /Libs/MPI/lat2b_init . f ' 

C LGT MPI checkerboard routines: 



' Double-layered torus : 
Couple lattice 1 and 2. 



include ' . ./. . /Libs/MPI_par/cblat_init . f ' 
include '../.. /Libs/MPI_par/cblgtpnt2 . f ' ! 

include ' . ./. . /Libs/MPI_par/cblgtpointer . f ' 

include ' . ./. . /Libs/MPI_par/cbpointer . f ' 

include ' . ./. . /Libs/MPI_par/cpointer . f ' 

include ' . ./. . /Libs/MPI_par/isglobal . f ' 

include ' . ./. . /Libs/MPI_par/write_act_mpi . f ■ 



Final LGT pointers. 



! Write act . 



C SU3 MPI checkerboard routines: 

include '../.. /Libs/MPISU3/cbsu3_act_mpi . f ' ! Calculate action, 
include ' . ./. . /Libs/MPISU3/cbsu3_actdif _mpi . f ' ! Check action, 
include ' . ./. . /Libs/MPISU3/cbsu3_bnd_mpi . f ' ! Gather boundary. 



G 



include '../.. /Libs/MPISU3/cbsu3_bndla_mpia. f ' ! Gather boundary. 

include '../.. /Libs/MPISU3/cbsu3_bndlb_mpia. f ' ! Gather boundary. 

include '../.. /Libs/MPISU3/cbsu3_bnd2a_mpia. I ' ! Gather boundary. 

include ' . . I . . /Libs/MPISU3/cbsu3_bnd2b_mpia. f ' ! Gather boundary. 

include '../.. /Libs/MPISU3/cbsu3_wloops_mpi . I ' ! Measures Wloops . 

include '../.. /Libs/MPISU3/cbsu3rw_meas . I ' ! R/W measurements . 
C Double-layered torus SU3 routines: 

include '../.. /Libs/MPISU3/cbsu3_2init_mpi . f ' ! Initialize SU3. 
include '../.. /Libs/MPISU3/cbsu3_2hbnhit.mpi . f 1 ! MCHB updating, 
include ' . . I . . /Libs/MPISU3/cbsu3,bstaplel . f ' ! Staple checkb. 1. 
include '../.. /Libs/MPISU3/cbsu3.bstaple2 . f ' ! Staple checkb. 2. 
include '../.. /Libs/MPISU3/cbsu3_iba_mpi .f ' ! Define iba arrays . 

In the first lines of the program, after the com- 
ments, the general structure is defined. Variables 
are declared throughout the entire code by includ- 
ing the implicit. 08 file of the Fortran library 
folder: 

implicit real*8 (a-h,o-z) 
implicit logical (1) 

This has the advantage that the type of a vari- 
able follows from its first letter. An exception to 
this rule are character variables, which are explic- 
itly declared, though their first letter is always c. 
No complex variables are used. MPI is setup by 
including the system provided file mpif .h and a 
number of constants are defined by including the 
file constants .08 (see inside the file). 

The program is compiled by a file mpimake, or 
similar, of which a copy is located in each project 
folder and listed here as used for Open MPf^l. 

cp *.par . ./. . /Libs/Fortran_par/ . 

mpif 77 -0 -Wall $1 

rm . ./. ./Libs/Fortran_par/*.par 

The mpimake command transfers the parameter 
files into the Fortran_par folder and removes them 
from there after the compile step. This creates 
a hyperstructure, which transfers to all subrou- 
tines identical parameter values and dimensions 
common blocks properly. As already mentioned, 
to keep this structure intact runs must be carried 
out in subfolders, which are two levels down from 
STMC2LSU3MPI. Job submission is subsequently 
done by run* executables, which are kept in the 
run subfolders. 



2 Similar mpichmake files are included in the MPICH fold- 
ers. For runs on the Cray the compile step is in the job 
q.run* control cards, because of the queuing system there. 
In our Open MPI installation we had to use the b version 
of the program, c on the Cray, while the a version is suffi- 
cient for some of the runs we performed on the Cray and 
all the MPICH runs documented in this paper. 



All library routines needed by the program are 
explicitly included at the end of the main program. 
So their source code can be easily located. Excep- 
tions are calls to MPI routines (all routines with 
names starting with mpi), which have to be looked 
up in MPI manuals or tutorials (for instance [12], 
see [11] for subtle points with send and receive). 
Step by step the execution of a run is explained in 
the following. 

(i) MPI initialization by mpi_init. 

(ii) Calculation of the rank (identity my _id of the 
MPI process) by mpi_commjrank. 

(iii) Some printout from MPI process zero, setup 
of printout for each process if lud2 is true. 

(iv) A call to cbsu3_2init_mpi initializes the run, 
setting up many important features: 

(a) A call to rmaset initializes Marsaglia's 
(pseudo) random number generators 
[15,12] used throughout this code. For 
lsd2mpi true the process rank is in- 
voked in the seed, so that a different 
generator is used for each MPI process. 

(b) Definition of pointer arrays for checker- 
board labeling and exchange of bound- 
aries by calls to lat_init, for nlat=2 
also to lat2a_init and lat2b_init, 
then to cblat_init, for lbcex true 
(means MPI boundary conditions ex- 
change) to cblgtpointer and, finally, 
for ndmpi>2 to cblgtpnt2. 

(c) For the DLT a call to cbsu3_iba_mpi as- 
signs a unique (3 to each plaquette. This 
routine has to be called before the start 
configuration is initialized, because it 
uses the SU(3) matrix array for tempo- 
rary storage. Tags are finally stored in 
the arrays ibal and iab2 of the com- 
mon block common_cbsu3 . f and point- 
ers to the /3 values in the array ba(0 : 4) . 

(d) A call to csu3_start generates a SU(3) 
start configuration. 

(e) A call to cbsu3_act_mpi calculates the 
initial action. 

(v) Calls to cbsu3_actdif _mpi check whether 
the action kept on record during the updat- 
ing process agrees with the one obtained by 
direct calculation. Process writes action 
information to the formatted output file 
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(unit iuo) through calls to write_act_mpi. 

(vi) Calls to mpi_barrier are supposed to syn- 
chronize the MPI processes, but may indeed 
have no effect. 

(vii) Calls to write_progress by MPI process 
write information to a file progress . d, which 
is opened and closed, so that the user can 
look up the file during run time. 

(viii) For equilibration a double loop (nreq and 
nequi) of calls to the updating routine 
cbsu3_2hbnhit_mpi is performed. The pur- 
pose of a double loop is that the run can 
be interrupted when the total equilibration 
time exceeds the CPU time allowed for a 
single run. The updating routine relies on a 
number of subroutines: 

(a) cbsu3_bstaplel calculates the staple 
for updating a link matrix on checker- 
board 1 (cbsu3_bstaple2 correspond- 
ingly on checkerboard 2). These rou- 
tines use various matrix manipulation 
routines from Libs/SU3. 

(b) su3mult_m_m_m multiplies SU(3) matri- 
ces of the first two arguments and re- 
turns the result in the third argument. 

(c) su3reunit reunitarizes a SU(3) matrix. 

(d) cbsu3_bndla_mpia collects boundaries 
(no corners) from a sublattice checker- 
board 1 and sends them to other sublat- 
tices (cbsu3_bnd2a_mpia for collection 
from checkerboard 2). A subtle point 
is in gauge systems that one needs for 
ndmpi>l corner links from two neigh- 
boring sublattices like the links emerg- 
ing from sites 2 and 3 in Fig. 3. This is 
handled by one more routine: 

(e) cbsu3_bndlb_mpia collects for ndmpi>l 
boundary corners from checkerboard 1 
and sends them to other sublattices 
(cbsu3_bnd2b_mpia for collection from 
checkerboard 2). 

(ix) Updating sweeps with measurements are 
carried out in a triple loop (nrpt, nmeas 
and nsw). Measurements are done every 
nsw sweeps and kept in time series arrays 
of length nmeas. To write reasonably sized 
unformatted arrays to disk is considerably 
faster than writing after each measurement 



2 






















1 




3 



Fig. 3. Links emerging at sites 2 and 3 are needed for 
updates of links emerging from site 1 (the broken lines 
indicate a division into sublattices). 

step. Increasing nsw prevents strongly cor- 
related measurements. A good choice for 
nsw is between 1% and 10% of the expected 
integrated autocorrelation time T in t [12], 
which depends not only on (3 and the lattice 
size, but also on the observable. Using a too 
large value for nsw destroys the possibility 
to estimate 7} n t from the run data, 
(x) Measurements are temporarily stored in ar- 
rays of the common block common_cbsu3. For 
spacelike and timelike plaquettes they are 
done by cbsu3_wloops_mpi and kept in the 
times series (ts) arrays tsws and tswt. 



3. Verifications 

Although our code is written for a variable 
lattice dimension D, tests have so far been lim- 
ited to 4D. The programs and routines are only 
moderately cleaned up. Many parts have disabled 
(ltest=. false.) or commented out test options. 
They are presently left in the code, because they 
could come into use again. 

This section deals with verifications, which were 
performed on a 2 GHz AMD Athlon 64 XM Dual 
Core Processor 3600+ at Leipzig University. MPI 
runs with mpif actor > 1 use both processors, sin- 
gle processor runs one of them. Fortran 77 com- 
pilation was done with the g77 compiler based on 
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gcc version 4.3.2 (Debian 4.3.2-1.1). MPI runs were 
performed with MPICH version 1.27pl. Compiler 
warnings about slow initialization of large aggre- 
gate areas have been ignored as the produced code 
works just fine. More MPI runs using up to 16 cores 
on a PC cluster and up to 1 296 on a Cray are doc- 
umented in [11]. 

A strong test for the correct implementation of 
exchange of boundaries is provided by using iden- 
tical random numbers on each sublattice. Then re- 
sults from all sublattices have to agree and be iden- 
tical with a run on a single lattice of this size with 
PBC. After such tests, real production runs were 
performed to compare action expectation values 
with results from the literature [16] and from our 
conventional (non-MPI) SU(3) code used before in 
Rcf. [13] (they are setup in the STMCSU3 project 
folder in essentially the same way as the MPI pro- 
grams in the main tree). 

Because there are no IEEE (Institute for Electri- 
cal and Electronics Engineers) standards for For- 
tran functions, the precise numbers obtained in 
trial runs depend on the computing platform due 
to rounding errors, which lead at some point to dis- 
tinct accept/reject steps. For averages agreement 
in the statistical sense has to hold. This is still very 
restrictive as the statistical errors are often small. 

3.1. Periodic boundary conditions 

This section deals with verifications for simula- 
tions with PBC, i.e., the parameter values 

nlat = 1 and lat2 = .false. . 

All parameters of specific runs are kept in the sub- 
folders of the project 

1MPICH . 

3.2. Identical random numbers on sublattices 
With the 

lsd2mpi = .false. 

option identical random numbers are used in all 
sublattices. We performed such simulations on 4 4 
sublattices with parameters 

nmeas = nequi = 2 12 , nrpt = 32, nsw = 2 (9) 



at (3 — 5.5, 5.6 and 5.7 using ordered and disor- 
dered starts. The average actions of the MPI runs 
are obtained by running the analysis program 

ana2sublatsu3.f (10) 

and the values were found in statistical agreement 
with those from a conventional single processor 
SU(3) Fortran program: The average over the Q 
values of Gaussian difference tests [12] between 
these six runs was close to 0.5 as it should. 
We document here only the 

P = 5.6 runs with ordered starts . 

The analysis is kept on ana2.txt files. We ob- 
tained for the mean action per plaquette with MPI 
code (error bars are given in parenthesis and al- 
ways rounded upwards in their second digit) 

act = 0.53811 (19), (11) 

versus with single processor code 

act = 0.53770(18), (12) 

leading to an acceptable Q = 0.12 in the Gaus- 
sian difference test. The integrated autocorrela- 
tion time of these runs is estimated to be r; n t = 
49.5 (3.4). So an error bar calculation with respect 
to 32 bins is appropriate. 

MPI runs were repeated for the pairs (1,1), (2,1), 
(2,2), (2,3), (2,4), (3,1), (3,2) of the parameters 

(mpif actor, ndmpi) (13) 

giving (due to lsd2mpi false) always to the same 
average action (11). The corresponding numbers of 
MPI lattice points (MPI processes) msmpi (7) are 
1, 2, 4, 8, 16, 3 and 9. 

Parameters of the runs of this section are kept in 

{Fjnnxnnynnznntnpnbnf nd (14) 

subfolders of 1MPICH, where F indicates lsd2mpi = 
.false, and is omitted for lsd2mpi = .true.. The 
letters n indicate numbers, which can be different. 
Lattice extensions are given by nnx, nny, nnz and 
nnt. This is followed by npnb for (3q — /?i=n.n, 
by n from nf for mpif actor = n, and n from nd 
for ndmpi = n. In the folder names extensions of 
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the full lattice are used, whereas data set names 
created by the program (8) are of the form 

SU3LGTndnf ndnnnxnnntnnnn.D , (15) 

showing sublattice extensions of the x and t direc- 
tions. The program calculates also the extensions 
of the full lattice from latmpi . par and prints them 
in the readable output file. Another way to find 
sublattice extensions is from the folder name by di- 
viding the full lattice extensions by mpif actor for 
the ndmpi directions. The results have to be inte- 
gers without rest term. As folder names are created 
by hand, the output file from the run is authorita- 
tive in case of a discrepancy. 

The other acronyms in the data set name (15) 
are the lattice dimension D = n of the first nd, 
then mpif actor = n from nf , and the MPI lattice 
dimension ndmpi = n from the second nd. The 
extension nnnn . D labels data files by their process 
number. Each of the created data files corresponds 
to one of the sublattices. After data production, 
the Tprocessor MPI program 

su3datcollect.f (16) 

condenses these data files into a single one for 
which the extensions tnnnn.D are reduced to t .D. 
When disk space fills up it is sufficient to keep 
only the *t . D files. 

To give an example, the subfolder name 

F08x080y40z04t5p6b2f2d 

corresponds to lsd2mpi = .false, runs on 4 4 sub- 
lattices at = 5.6 with a full lattice size 8 2 4 2 . The 
MPI run produces four sublattice data sets 

SU3LGT4d2f 2d004x004tnnnn.D 

with nnnn from 0000 to 0003. After data collection 
with (16) the file 

SU3LGT4d2f2d004x004t.D 

results, which can be analyzed further. 

3.3. Different random numbers on sublattices 
We set 

lsd2mpi = .true, and mpif actor = 2 



Table 1 

Runs with np MPI processes, mpif actor =nf, ndmpi=n on 
a periodic 8 3 4 lattice at /3 = 5.65. 



np 


nf 


n 


nil 


nl2 


nl3 


nl4 


time 


actm 


Q 








8 


8 


8 


4 


248 m 


0.538547 (70) 




1 


IF 




8 


8 


8 


4 


282 m 


0.538471 (61) 


0.41 


1 


1 




8 


8 


8 


4 


287 m 


0.538471 (61) 


1.00 


2 


2 


1 


4 


8 


8 


4 


147 m 


0.538584 (71) 


0.23 


4 


2 


2 


4 


4 


8 


4 


150 m 


0.538562 (63) 


0.82 


8 


2 


3 


4 


4 


4 


4 


158 m 


0.538468 (81) 


0.36 



in latmpi .par and produce data at (3 — 5.65 from 
simulations of 8 3 4 lattices, which are partitioned 
into different numbers of sublattices. Average ac- 
tion densities are compiled in table 1. The statis- 
tics of each run is the same as before (9). Each Q 
value in the table corresponds to the Gaussian dif- 
ference test with the action value in the row above. 
The number of MPI processes agrees with the num- 
ber of sublattices given by (7). The time column 
contains the CPU time measured on the Athlon 
PC. The non-MPI run for the first row uses the 
FHKP heatbath in the repeat until accepted ver- 
sion. Whether it is slower of faster than the MPI 
program run for one process (nf = 1) depends 
on the Fortran compiler and the MPI installation. 
Here it turns out to be faster, but that is not the 
case on the PCs used in [11]. For nf = 1 one can 
turn off the boundary exchange in the MPI pro- 
gram by setting lbcex = .false., indicated by IF 
in the nf column. Results stay number by number 
identical and the small speedup is negligible for 
practical purposes, indicating that MPI send and 
receive is very efficient as long as communication 
stays within the same computer node. The decrease 
of CPU time from msmpi = 1 to mspi = 2 reflects 
the gain in real time due to using both cores of the 
PC. It is given by a factor slightly larger than 1/2 
due to communication overhead. The parameters 
of these runs are setup in the 

08x08y08z04t5p65bnfnd 

subfolders of 1MPICH, where the notation is as in- 
troduced in (14). 

Results from 16 3 4 lattices, which allow for com- 
parison with spacelike and timelike plaquette aver- 
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Table 2 

Spacelike and timelike plaquette expectation values for 
comparison with Ref. [16] (B in the np column): Runs on a 
16 3 4 lattice at = 5.65 using our non-MPI program and 
MPI code with np processes (sublatticcs). 



np 


spacelike 


Q 


timelike 


Q 


13 


0.537638 (17) 




0.537692 (19) 






0.537650 (15) 


0.60 


0.537711 (14) 


0.42 


1 


0.537647 (17) 


0.89 


0.537701 (17) 


0.65 


2 


0.537650 (16) 


0.90 


0.537704 (17) 


0.90 


4 


0.537648 (16) 


0.93 


0.537708 (16) 


0.86 


8 


0.537661 (18) 


0.59 


0.537714 (17) 


0.80 



ages of the literature [16], are compiled in table 2. 
For our runs the statistics (9) was used again. The 
estimates of [16] rely on 20 000 to 40 000 sweeps af- 
ter thermalization. After collection of our data by 
running (16), the analysis program 

anaw_cbsu3.f 

estimates the expectation values for spacelike and 
timclike plaquettes. Again, Q values correspond to 
Gaussian difference tests with the row above. The 
slight increase of all values with increasing numbers 
of processes is accidental and not reproduced when 
using different random number generator seeds. 

3.4. Double-layered torus 

This section deals with verifications for simula- 
tions with 

nlat = 2 

and folders of the runs are kept in 
2MPICH . 

For lat2 = .false, the exchange of boundaries is 
turned off and one performs independent runs on 
two lattices with PBC. This is not very interesting 
and we discuss only runs with lat2 = .true.. 

With exchange of boundaries turned on, a strong 
verification test is to run with different random 
numbers on the sublattices of a torus, but identical 
(5 values and random numbers on each torus. These 
are the parameter options 

lsd2mpi = .true, and lat2test = .true. , 



designed to reproduce the action values of the 
run with PBC for which statistics and sublattices 
match. For tori of size 8 3 4 and 0o = 01 = 5.65 
these are values of table 1. These test runs worked 
out as required and are setup in the 

T08x08y08z04t5p65bnfnd 

folders of 2MPICH, where the initial T indicates that 
lat2test is set to true. 

Table 3 

Runs with np MPI processes, mpif actor=nf, ndmpi=n on 
a double-layered 8 3 4 lattice at ft = 5.55, disordered starts. 



np 


nf 


n 


nil 


nl2 


nl3 


nl4 


actm 


Q 


2 


1 


1 


8 


8 


8 


4 


0.510608 (33) 




4 


2 


1 


4 


8 


8 


4 


0.510671 (28) 


0.15 


8 


2 


2 


4 


4 


8 


4 


0.510597 (24) 


0.04 


16 


2 


3 


4 


4 


4 


4 


0.510676 (32) 


0.05 



Proper simulations on a DLT are performed with 
lat2test= .false . . Table 3 gives reference values 
for the action at (3q = (3\ = 5.55 (at (i — 5.65 
one needs more statistics due to autocorrelations 
that become important for small error bars). The 
Q values refer to Gaussian difference tests as in 
previous cases. They are a bit on the low side, but 
it is clear that simply reordering the comparison of 
data would change that. Also runs with a different 
compiler gave (in the same order as listed in the 
table) Q = 0.97, 0.09, and 0.30. 



4. Summary and Conclusions 

The code of this paper allows Markov chain 
Monte Carlo calculations of pure SU(3) lattice 
gauge theory on computers which have MPI in- 
stalled. Besides the usual periodic boundary con- 
ditions, the geometry of a double-layered torus is 
implemented, which allows for distinct inside and 
outside environments (the "inside" of one lattice 
is the other's "outside" and vice versa). A consid- 
erable number of non-trivial verification are in- 
cluded in this paper. The CPU time performance 
of the code as function of the number of proces- 
sors (more precisely CPU cores) is documented in 
a companion paper [11]. 
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Should the a-version of the program "hang-up" 
without producing an error message, the cause are 
likely MPI send and receive problems which are 
discussed and partially resolved in [11]. Although 
designed for arbitrary D>2 dimensions, the code 
has presently only been tested in 4D. Hence, it is 
unlikely to work straightaway for other D values 
(nd in lat.par), though required fix-ups are ex- 
pected to be minor. Of course, it is in the respon- 
sibility of the final user to perform stringent tests 
before applying the provided code to any purpose. 
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